Sampling Bias in Estimation of Distribution Algorithms for Genetic Programming Using Prototype Trees
نویسندگان
چکیده
Probabilistic models are widely used in evolutionary and related algorithms. In Genetic Programming (GP), the Probabilistic Prototype Tree (PPT) is often used as a model representation. Drift due to sampling bias is a widely recognised problem, and may be serious, particularly in dependent probability models. While this has been closely studied in independent probability models, and more recently in probabilistic dependency models, it has received little attention in systems with strict dependence between probabilistic variables such as arise in PPT representation. Here, we investigate this issue, and present results suggesting that the drift effect in such models may be particularly severe – so severe as to cast doubt on their scalability. We present a preliminary analysis through a factor representation of the joint probability distribution We suggest future directions for research aiming to overcome this problem.
منابع مشابه
Estimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images
Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...
متن کاملRare-Event Estimation for Dynamic Fault Trees
Article describes the results of the development and using of Rare-Event Monte-Carlo Simulation Algorithms for Dynamic Fault Trees Estimation. For Fault Trees estimation usually analytical methods are used (Minimal Cut sets, Markov Chains, etc.), but for complex models with Dynamic Gates it is necessary to use Monte-Carlo simulation with combination of Importance Sampling method. Proposed artic...
متن کاملOptimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network
Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...
متن کاملEstimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest
Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...
متن کاملSpatiotemporal Estimation of PM2.5 Concentration Using Remotely Sensed Data, Machine Learning, and Optimization Algorithms
PM 2.5 (particles <2.5 μm in aerodynamic diameter) can be measured by ground station data in urban areas, but the number of these stations and their geographical coverage is limited. Therefore, these data are not adequate for calculating concentrations of Pm2.5 over a large urban area. This study aims to use Aerosol Optical Depth (AOD) satellite images and meteorological data from 2014 to 2017 ...
متن کامل